Search CORE

103 research outputs found

phyloXML: XML for evolutionary biology and comparative genomics

Author: Christian M Zmasek
CM Zmasek
CM Zmasek
CM Zmasek
DR Maddison
E Antezana
J Felsenstein
J Felsenstein
J Leebens-Mack
JA Eisen
JC Avise
JE Stajich
Mira V Han
MW Peterson
N Cannata
N Goto
PJ Cock
Q Zhang
R Gilmour
T Bray
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Evolutionary trees are central to a wide range of biological studies. In many of these studies, tree nodes and branches need to be associated (or annotated) with various attributes. For example, in studies concerned with organismal relationships, tree nodes are associated with taxonomic names, whereas tree branches have lengths and oftentimes support values. Gene trees used in comparative genomics or phylogenomics are usually annotated with taxonomic information, genome-related data, such as gene names and functional annotations, as well as events such as gene duplications, speciations, or exon shufflings, combined with information related to the evolutionary tree itself. The data standards currently used for evolutionary trees have limited capacities to incorporate such annotations of different data types. Results We developed a XML language, named phyloXML, for describing evolutionary trees, as well as various associated data items. PhyloXML provides elements for commonly used items, such as branch lengths, support values, taxonomic names, and gene names and identifiers. By using "property" elements, phyloXML can be adapted to novel and unforeseen use cases. We also developed various software tools for reading, writing, conversion, and visualization of phyloXML formatted data. Conclusion PhyloXML is an XML language defined by a complete schema in XSD that allows storing and exchanging the structures of evolutionary trees as well as associated data. More information about phyloXML itself, the XSD schema, as well as tools implementing and supporting phyloXML, is available at <url>http://www.phyloxml.org</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

PHY·FI: fast and easy online creation and manipulation of phylogeny color figures

Author: CM Zmasek
DA Ruths
G Palidwor
G Perrière
G Savva
G Trooskens
J Felsenstein
J Müller
Jakob Fredslund
R Chenna
R Ihaka
RDM Page
S Kumar
T Hughes
U Rost
V Makarenkov
WP Maddison
Publication venue: BioMed Central
Publication date: 01/06/2006
Field of study

BACKGROUND: The need to depict a phylogeny, or some other kind of abstract tree, is very frequently experienced by researchers from a broad range of biological and computational disciplines. Thousands of papers and talks include phylogeny figures, and often during everyday work, one would like to quickly get a graphical display of, e.g., the phylogenetic relationship between a set of sequences as calculated by an alignment program such as ClustalW or the phylogenetic package Phylip. A wealth of software tools capable of tree drawing exists; most are comprehensive packages that also perform various types of analysis, and hence they are available only for download and installing. Some online tools exist, too. RESULTS: This paper presents an online tool, PHY·FI, which encompasses all the qualities of existing online programs and adds functionality to hopefully eliminate the need for post-processing the phylogeny figure in some other general-purpose graphics program. PHY·FI is versatile, easy-to-use and fast, and supports comprehensive graphical control, several download image formats, and the possibility of dynamically collapsing groups of nodes into named subtrees (e.g. "Primates"). The user can create a color figure from any phylogeny, or other kind of tree, represented in the widely used parenthesized Newick format. CONCLUSION: PHY·FI is fast and easy to use, yet still offers full color control, tree manipulation, and several image formats. It does not require any downloading and installing, and thus any internet user regardless of computer skills, and computer platform, can benefit from it. PHY·FI is free for all and is available from this web address

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Dendroscope: An interactive viewer for large phylogenetic trees

Author: Christian Rausch
CM Zmasek
D Huson
Daniel C Richter
Daniel H Huson
DH Huson
DR Maddison
DR Maddison
F Chevenet
J Bingham
J Felsenstein
J Felsenstein
Markus Franz
R Christen
RDM Page
Regula Rupp
S Kumar
T Munzner
Tobias Dezulian
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Research in evolution requires software for visualizing and editing phylogenetic trees, for increasingly very large datasets, such as arise in expression analysis or metagenomics, for example. It would be desirable to have a program that provides these services in an effcient and user-friendly way, and that can be easily installed and run on all major operating systems. Although a large number of tree visualization tools are freely available, some as a part of more comprehensive analysis packages, all have drawbacks in one or more domains. They either lack some of the standard tree visualization techniques or basic graphics and editing features, or they are restricted to small trees containing only tens of thousands of taxa. Moreover, many programs are diffcult to install or are not available for all common operating systems. Results We have developed a new program, Dendroscope, for the interactive visualization and navigation of phylogenetic trees. The program provides all standard tree visualizations and is optimized to run interactively on trees containing hundreds of thousands of taxa. The program provides tree editing and graphics export capabilities. To support the inspection of large trees, Dendroscope offers a magnification tool. The software is written in Java 1.4 and installers are provided for Linux/Unix, MacOS X and Windows XP. Conclusion Dendroscope is a user-friendly program for visualizing and navigating phylogenetic trees, for both small and large datasets.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes

Author: A Alexeyenko
A Hadgu
Aaron J. Mackey
AJ Enright
AJ Enright
CE Storm
CE Storm
Cecile Fairhead
CG Elsik
CM Zmasek
CM Zmasek
David S. Roos
DP Wall
EL Sonnhammer
EV Koonin
EV Koonin
F Chen
Feng Chen
H Hegyi
J Gouzy
J Magidson
JD Thompson
Jeroen K. Vermunt
JK Vermunt
JK Vermunt
KP O'Brien
L Li
LB Koski
M Remm
RF Doolittle
RL Tatusov
RL Tatusov
RL Tatusov
RL Tatusov
S Bandyopadhyay
S Henikoff
S Van Dongen
SF Altschul
SL Hui
T Hulsen
TF Deluca
WM Fitch
WM Fitch
Y Lee
Y Qu
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Orthology detection is critically important for accurate functional annotation, and has been widely used to facilitate studies on comparative and evolutionary genomics. Although various methods are now available, there has been no comprehensive analysis of performance, due to the lack of a genomic-scale ‘gold standard’ orthology dataset. Even in the absence of such datasets, the comparison of results from alternative methodologies contains useful information, as agreement enhances confidence and disagreement indicates possible errors. Latent Class Analysis (LCA) is a statistical technique that can exploit this information to reasonably infer sensitivities and specificities, and is applied here to evaluate the performance of various orthology detection methods on a eukaryotic dataset. Overall, we observe a trade-off between sensitivity and specificity in orthology detection, with BLAST-based methods characterized by high sensitivity, and tree-based methods by high specificity. Two algorithms exhibit the best overall balance, with both sensitivity and specificity>80%: INPARANOID identifies orthologs across two species while OrthoMCL clusters orthologs from multiple species. Among methods that permit clustering of ortholog groups spanning multiple genomes, the (automated) OrthoMCL algorithm exhibits better within-group consistency with respect to protein function and domain architecture than the (manually curated) KOG database, and the homolog clustering algorithm TribeMCL as well. By way of using LCA, we are also able to comprehensively assess similarities and statistical dependence between various strategies, and evaluate the effects of parameter settings on performance. In summary, we present a comprehensive evaluation of orthology detection on a divergent set of eukaryotic genomes, thus providing insights and guides for method selection, tuning and development for different applications. Many biological questions have been addressed by multiple tests yielding binary (yes/no) outcomes but no clear definition of truth, making LCA an attractive approach for computational biology

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Tilburg University Repository

Getting Started in Structural Phylogenomics

Author: B Qian
CE Jones
CM Zmasek
D Baker
D Brown
DJ Zwickl
ED Scheeff
F Delsuc
I Friedberg
JA Eisen
K Sjölander
Kimmen Sjölander
ML Green
MY Galperin
N Goldman
N Krishnamurthy
N Krishnamurthy
O Goldenberg
Olga Troyanskaya
RC Edgar
S Sankararaman
SE Brenner
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

Three-Dimensional Phylogeny Explorer: Distinguishing paralogs, lateral transfer, and violation of "molecular clock" assumption with 3D visualization

Author: AJ Saldanha
Christopher Lee
CM Zmasek
CS Parr
DL Swofford
DL Wheeler
EV Koonin
G Trooskens
JD Retief
M Stallmann
MJ Sanderson
Namshin Kim
PL Lott
R Chenna
RD Page
RL Tatusov
RL Tatusov
RL Tatusov
RL Tatusov
S Kumar
SW Graham
Y Zhai
Z Du
Publication venue: BioMed Central
Publication date: 01/06/2007
Field of study

Abstract Background Construction and interpretation of phylogenetic trees has been a major research topic for understanding the evolution of genes. Increases in sequence data and complexity are creating a need for more powerful and insightful tree visualization tools. Results We have developed 3D Phylogeny Explorer (3DPE), a novel phylogeny tree viewer that maps trees onto three spatial axes (species on the X-axis; paralogs on Z; evolutionary distance on Y), enabling one to distinguish at a glance evolutionary features such as speciation; gene duplication and paralog evolution; lateral gene transfer; and violation of the "molecular clock" assumption. Users can input any tree on the online 3DPE, then rotate, scroll, rescale, and explore it interactively as "live" 3D views. All objects in 3DPE are clickable to display subtrees, connectivity path highlighting, sequence alignments, and gene summary views, and etc. To illustrate the value of this visualization approach for microbial genomes, we also generated 3D phylogeny analyses for all clusters from the public COG database. We constructed tree views using well-established methods and graph algorithms. We used Scientific Python to generate VRML2 3D views viewable in any web browser. Conclusion 3DPE provides a novel phylogenetic tree projection method into 3D space and its web-based implementation with live 3D features for reconstruction of phylogenetic trees of COG database.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

TreeGraph 2: Combining and visualizing evidence from different phylogenetic analyses

Author: A Barakat
A Stamatakis
AJ Drummond
Ben C Stöver
CM Zmasek
D Huson
D Richardson
DR Maddison
F Chevenet
GE Jordan
H Shimodaira
J Müller
J Sampedro
K Müller
Kai F Müller
LB Zhang
LM Zahn
M Holder
M Pagel
MV Han
RDM Page
RK Jansen
S Kumar
S Whelan
WP Maddison
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Today it is common to apply multiple potentially conflicting data sources to a given phylogenetic problem. At the same time, several different inference techniques are routinely employed instead of relying on just one. In view of both trends it is becoming increasingly important to be able to efficiently compare different sets of statistical values supporting (or conflicting with) the nodes of a given tree topology, and merging this into a meaningful representation. A tree editor supporting this should also allow for flexible editing operations and be able to produce ready-to-publish figures. Results We developed TreeGraph 2, a GUI-based graphical editor for phylogenetic trees (available from <url>http://treegraph.bioinfweb.info</url>). It allows automatically combining information from different phylogenetic analyses of a given dataset (or from different subsets of the dataset), and helps to identify and graphically present incongruences. The program features versatile editing and formatting options, such as automatically setting line widths or colors according to the value of any of the unlimited number of variables that can be assigned to each node or branch. These node/branch data can be imported from spread sheets or other trees, be calculated from each other by specified mathematical expressions, filtered, copied from and to other internal variables, be kept invisible or set visible and then be freely formatted (individually or across the whole tree). Beyond typical editing operations such as tree rerooting and ladderizing or moving and collapsing of nodes, whole clades can be copied from other files and be inserted (along with all node/branch data and legends), but can also be manually added and, thus, whole trees can quickly be manually constructed de novo. TreeGraph 2 outputs various graphic formats such as SVG, PDF, or PNG, useful for tree figures in both publications and presentations. Conclusion TreeGraph 2 is a user-friendly, fully documented application to produce ready-to-publish trees. It can display any number of annotations in several ways, and permits easily importing and combining them. Additionally, a great number of editing- and formatting-operations is available.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

PhyloPattern: regular expressions to identify complex patterns in phylogenetic trees

Author: A Bateman
A Levasseur
AK Wright
BE Engelhardt
CA Paulding
CM Zmasek
D Barker
D Durand
DH Huson
DHD Warren
J Felsenstein
J McCarthy
J Ruan
JD Thompson
JF Dufayard
JS Farris
Julie D Thompson
L Arvestad
N Krishnamurthy
O Sakarya
P Gouret
Philippe Gouret
Pierre Pontarotti
RG Beiko
T Blomme
T Dobzhansky
TJ Hubbard
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background To effectively apply evolutionary concepts in genome-scale studies, large numbers of phylogenetic trees have to be automatically analysed, at a level approaching human expertise. Complex architectures must be recognized within the trees, so that associated information can be extracted. Results Here, we present a new software library, PhyloPattern, for automating tree manipulations and analysis. PhyloPattern includes three main modules, which address essential tasks in high-throughput phylogenetic tree analysis: node annotation, pattern matching, and tree comparison. PhyloPattern thus allows the programmer to focus on: i) the use of predefined or user defined annotation functions to perform immediate or deferred evaluation of node properties, ii) the search for user-defined patterns in large phylogenetic trees, iii) the pairwise comparison of trees by dynamically generating patterns from one tree and applying them to the other. Conclusion PhyloPattern greatly simplifies and accelerates the work of the computer scientist in the evolutionary biology field. The library has been used to automatically identify phylogenetic evidence for domain shuffling or gene loss events in the evolutionary histories of protein sequences. However any workflow that relies on phylogenetic tree analysis, could be automated with PhyloPattern.</p

Crossref

HAL AMU

Springer - Publisher Connector

Directory of Open Access Journals

HAL-Inserm

PubMed Central

FastBLAST: Homology Relationships for Millions of Proteins

Author: A Marchler-Bauer
AA Schaffer
Adam P. Arkin
BE Suzek
Cecile Fairhead
CH Wu
CM Zmasek
D Wilson
F Pearl
H Mi
I Letunic
JD Selengut
LB Koski
M Remm
MN Price
Morgan N. Price
NJ Mulder
Paramvir S. Dehal
PS Dehal
R Durbin
RD Finn
RL Tatusov
S Yooseph
SF Altschul
W Gish
W Li
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

BackgroundAll-versus-all BLAST, which searches for homologous pairs of sequences in a database of proteins, is used to identify potential orthologs, to find new protein families, and to provide rapid access to these homology relationships. As DNA sequencing accelerates and data sets grow, all-versus-all BLAST has become computationally demanding.Methodology/principal findingsWe present FastBLAST, a heuristic replacement for all-versus-all BLAST that relies on alignments of proteins to known families, obtained from tools such as PSI-BLAST and HMMer. FastBLAST avoids most of the work of all-versus-all BLAST by taking advantage of these alignments and by clustering similar sequences. FastBLAST runs in two stages: the first stage identifies additional families and aligns them, and the second stage quickly identifies the homologs of a query sequence, based on the alignments of the families, before generating pairwise alignments. On 6.53 million proteins from the non-redundant Genbank database ("NR"), FastBLAST identifies new families 25 times faster than all-versus-all BLAST. Once the first stage is completed, FastBLAST identifies homologs for the average query in less than 5 seconds (8.6 times faster than BLAST) and gives nearly identical results. For hits above 70 bits, FastBLAST identifies 98% of the top 3,250 hits per query.Conclusions/significanceFastBLAST enables research groups that do not have supercomputers to analyze large protein sequence data sets. FastBLAST is open source software and is available at http://microbesonline.org/fastblast

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection

Author: A Kuzniar
C Vogel
CE Storm
CE Storm
CM Zmasek
EV Kriventseva
EW Sayers
F Delsuc
G Ostlund
M Ashburner
M Bashton
M Levitt
M Pellegrini
M Remm
R Jothi
RD Finn
RD Finn
RL Tatusov
RT van der Heijden
Timothy H Wu
Ting-wen Chen
TJ Hubbard
Wailap V Ng
Wen-chang Lin
WM Fitch
WM Fitch
Z Fu
Z Fu
Publication venue: BioMed Central
Publication date: 01/10/2010
Field of study

Abstract Background Orthologs are genes derived from the same ancestor gene loci after speciation events. Orthologous proteins usually have similar sequences and perform comparable biological functions. Therefore, ortholog identification is useful in annotations of newly sequenced genomes. With rapidly increasing number of sequenced genomes, constructing or updating ortholog relationship between all genomes requires lots of effort and computation time. In addition, elucidating ortholog relationships between distantly related genomes is challenging because of the lower sequence similarity. Therefore, an efficient ortholog detection method that can deal with large number of distantly related genomes is desired. Results An efficient ortholog detection pipeline DODO (DOmain based Detection of Orthologs) is created on the basis of domain architectures in this study. Supported by domain composition, which usually directly related with protein function, DODO could facilitate orthologs detection across distantly related genomes. DODO works in two main steps. Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity. Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes. The output results of DODO are highly comparable with other known ortholog databases. Conclusions DODO provides a new efficient pipeline for detection of orthologs in a large number of genomes. In addition, a database established with DODO is also easier to maintain and could be updated relatively effortlessly. The pipeline of DODO could be downloaded from <url>http://140.109.42.19:16080/dodo_web/home.htm</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central